OADP-145: Two-phase DC restore for restic #141

sseago · 2022-05-10T14:30:53Z

When using default volumes to restic, disconnect DeploymentConfig
pods from ReplicationController and label for later cleanup.
For DeploymentConfigs, set replicas to zero (and paused to false)
and annotate/label for later cleanup.

After restore completes, a post-restore bash script must be run
to clean up restic restore pods and restore DC to prior configuration.

Note that this is currently only done when the backup sets defaultVolumesToRestic=true -- for the restic annotation case, we don't know in the DeploymentConfig plugin that the DC had its pods disconnected. To enable this case we may need to add another backup annotation to communicate the user's desire to disconnect all DC pods and scale down DCs.

Also, note that this does not currently handle the case for post-restore hooks on DC pods. For restore-level hooks, this could be added similarly to what's being done w/ the defaultVolumesToRestic config. For pod annotations, the concerns are similar to restic annotations.

Also note that following a successful restore, the following script would need to be run in order to clean up the restore pods and restore the DC to its prior configuration (script location in github still pending):

#!/bin/bash
set -e

OADP_NAMESPACE=${OADP_NAMESPACE:=openshift-adp}

if [[ $# -ne 1 ]]; then
  echo "usage: ${BASH_SOURCE} restore-name"
  exit 1
fi

echo using OADP Namespace $OADP_NAMESPACE
echo restore $1

echo Deleting disconnected restore pods
oc delete pods -l openshift.io/disconnected-from-dc=$1

for dc in $(oc get dc --all-namespaces -l openshift.io/replicas-modified=$1 -o jsonpath='{range .items[*]}{.metadata.namespace}{","}{.metadata.name}{","}{.metadata.annotations.openshift\.io/original-replicas}{","}{.metadata.annotations.openshift\.io/original-paused}{"\n"}')
do
    IFS=',' read -ra dc_arr <<< "$dc"
    echo Found deployment ${dc_arr[0]}/${dc_arr[1]}, setting replicas: ${dc_arr[2]}, paused: ${dc_arr[3]}

    cat <<EOF | oc patch dc  -n ${dc_arr[0]} ${dc_arr[1]} --patch-file /dev/stdin
spec:
  replicas: ${dc_arr[2]}
  paused: ${dc_arr[3]}
EOF

done

sseago · 2022-05-10T14:32:09Z

One additional documentation note: The post-restore script will fail for Velero restores with names exceeding 63 chars, since those names are not valid labels.

kaovilai · 2022-05-10T14:41:41Z

This is cool

sseago · 2022-05-10T15:02:41Z

Note the bash script above has been added in a PR to oadp-operator to doc/scripts

kaovilai · 2022-05-10T16:46:40Z

velero-plugins/common/types.go

+const (
+	DCPodDeploymentLabel       string = "deployment"                        // identifies associated RC
+	DCPodDeploymentConfigLabel string = "deploymentconfig"                  // identifies associated DC
+	DCPodDisconnectedLabel     string = "openshift.io/disconnected-from-dc" // pod had DC labels removed on restore


I think pod labels should be oadp specific. Don't want to have cross-project label conflicts.

or openshift-velero-plugin specific. velero.openshift.io?

Is there a more specific label namespace we should be using for OADP labels? Looking at the current code, it seemed like we were using openshift.io and not something like oadp.openshift.io -- @dymurray what is the standard for OADP namespace for labels and annotations?

I know for migration use cases we started with openshift.io and eventually moved to migration.openshift.io -- Whatever we use, we should standardize on the appropriate label/annotation namespace for OADP in general. Currently we seem to be using only "openshift.io" and "migration.openshift.io" -- and nothing else.

@sseago @kaovilai we use the label openshift.io/oadp for OADP Operator resources https://github.com/openshift/oadp-operator/blob/master/api/v1alpha1/oadp_types.go#L31

Hmm. I'm not sure I've ever seen label/annotation names formulated this way. Is there precedent? For MTC we used migration.openshift.io/foo rather than openshift.io/migration/foo

ok alternatively openshift.io/oadp_disconnected-from-dc

I don't like it nested, i like the namespacing. oadp.openshift.io/disconnected-from-dc looks good to me.

In general I like moving all our annotations/labels to oadp.openshift.io as that's the namespace our APIs live under.

@dymurray or should it be adp.openshift.io? I'm a bit unclear on when to use "oadp" vs "adp" -- i.e. namespace is "openshift-adp" presumably because we don't want "openshift" mentioned twice in the name (once spelled out, once in the acronym). Would the same apply to the ".openshift.io" namespacing?

Whatever it is, I'm assuming we'd use this for all new oadp label/annotation names going forward. "oadp.openshift.io" is probably clearest.

When using default volumes to restic, disconnect DeploymentConfig pods from ReplicationController and label for later cleanup. For DeploymentConfigs, set replicas to zero (and paused to false) and annotate/label for later cleanup. After restore completes, a post-restore bash script must be run to clean up restic restore pods and restore DC to prior configuration.

sseago · 2022-07-13T15:21:46Z

@dymurray I've updated the PR to reflect the proposed label name changes, as well as updating it to work with the new GetBackup method signature.

kaovilai · 2022-07-18T15:10:49Z

velero-plugins/pod/restore.go

 		p.Log.Infof("[pod-restore] skipping restore of pod %s, has owner references, no restic backup, and no restore hooks", pod.Name)
 		return velero.NewRestoreItemActionExecuteOutput(input.Item).WithoutRestore(), nil
 	}

+	// If pod has both "deployment" and "deploymentconfig" labels, it belongs to a DeploymentConfig
+	// If defaultVolumesToRestic, remove these labels so that the DC won't immediately delete the


So when does this pod get deleted? never?

So we end up with restored Pod + 1 DeploymentConfig pod?

It needs to be cleaned up post-restore. At restore, we don't get 2 pods since the DC is restored with replicas=0. Cleanup should be triggered by running the script added in openshift/oadp-operator#684 -- deletes these pods and scales DC replicas back up to original values.

The underlying issue is that if you restore a pod and a DC-not-scaled-to-zero, then the restored pod is immediately deleted. The only way for restic restores or post-restore hooks to work with these pods is to short-circuit that action. There are two ways to do it. The first approach was a docs-only fix -- telling users to first run the restore excluding DCs and ReplicationControllers, then once restic and hooks were done, restore those missing resources from the same backup. The docs-only fix was later rejected in favor of this more active approach, where disconnected pods are restored along with scaled-down DCs. The disconnection was needed to prevent the DC controller from immediately scaling these pods to zero as called for by the modified restored DC. The only other way around this problem is to have users move to Deployments before backup.

kaovilai requested changes May 10, 2022

View reviewed changes

kaovilai mentioned this pull request May 11, 2022

OADP-145: post-restore script for restores with DCs and defaultVolumesToRestic openshift/oadp-operator#684

Merged

dymurray changed the title ~~Two-phase DC restore for restic~~ OADP-145: Two-phase DC restore for restic Jul 12, 2022

sseago force-pushed the dc-restore branch from bfee522 to abc4254 Compare July 13, 2022 15:20

kaovilai reviewed Jul 18, 2022

View reviewed changes

shubham-pampattiwar approved these changes Jul 18, 2022

View reviewed changes

kaovilai approved these changes Jul 18, 2022

View reviewed changes

sseago merged commit 1820454 into openshift:master Jul 18, 2022

kaovilai mentioned this pull request Sep 27, 2022

update 1.0 e2e to master with 1.0 accommodations openshift/oadp-operator#830

Closed

kaovilai mentioned this pull request Apr 12, 2023

oadp-1.0: OADP-1650 Add annotation to exclude build pod volumes during backup (#189) #190

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OADP-145: Two-phase DC restore for restic #141

OADP-145: Two-phase DC restore for restic #141

sseago commented May 10, 2022

sseago commented May 10, 2022

kaovilai commented May 10, 2022

sseago commented May 10, 2022

kaovilai May 10, 2022

kaovilai May 10, 2022

sseago May 10, 2022

sseago May 10, 2022

shubham-pampattiwar May 11, 2022

sseago May 11, 2022 •

edited

kaovilai May 11, 2022

dymurray Jun 7, 2022

sseago Jun 7, 2022

sseago Jun 7, 2022

sseago commented Jul 13, 2022

kaovilai Jul 18, 2022

sseago Jul 18, 2022

sseago Jul 18, 2022

OADP-145: Two-phase DC restore for restic #141

OADP-145: Two-phase DC restore for restic #141

Conversation

sseago commented May 10, 2022

sseago commented May 10, 2022

kaovilai commented May 10, 2022

sseago commented May 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sseago May 11, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sseago commented Jul 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sseago May 11, 2022 •

edited