Cluster migration: User is not able to take a backup after a restore #24

sowmyav27 · 2020-09-04T03:39:40Z

On master-head - commit id: 5d5ef3f8f and backup-restore tag: v0.0.1-rc9

Take a backup into S3
Deploy Rancher in a HA setup in EKS cluster
Deploy backup-restore app.
Restore from backup.
Rancher is restored successfully.
Create another backup by creating a backup CR.
No Backup is taken
Error seen in backup-restore-operator logs: E0904 03:36:51.157283 1 reflector.go:178] pkg/mod/k8s.io/client-go@v0.18.0/tools/cache/reflector.go:125: Failed to list *v1.Backup: Unauthorized

The text was updated successfully, but these errors were encountered:

mrajashree · 2020-09-06T19:04:24Z

~~I think this will be solved by including the Backup-restore operator CRDs in the resourceSet. Will check this and update the resourceset accordingly~~
This is not needed

mrajashree · 2020-09-13T21:41:13Z

The main reason behind "Unauthorized" errors is the service account tied to the pod.
We configure the operator pod to use the serviceaccount that has cluster-admin role. When this service account is created, k8s also creates a secret associated with it and mounts it in the pod. During restore, since prune is enabled by default, this secret gets deleted.
So if we restore with prune=false we shouldn't see this error. But that leads to the duplicate "Default" and "System" projects issue

mrajashree · 2020-09-14T17:14:31Z

The following steps should be used for restoring to a new cluster for the DR use case, which will ensure the operator pod retains its serviceaccount and associated secret

Install the backup-restore-operator on the new cluster using Helm CLI
Restore from backup AND set prune=false
This restore also adds in the secret associated with the helm release of rancher from cluster 1. So run helm upgrade instead of helm install and bring up rancher.

Discussed this offline with @cloudnautique and there is no need to bring up rancher first on the new cluster and then launch the operator from dashboard, if we're restoring from backup, it makes sense for the operator to bring up the entire setup. Will test these steps once again

mrajashree · 2020-09-14T22:07:38Z

Steps

helm install backup-restore-operator-crd rancherchart/backup-restore-operator-crd -n cattle-resources-system --create-namespace
helm install backup-restore-operator rancherchart/backup-restore-operator -n cattle-resources-system
kubectl apply -f migrationResource.yaml where prune=false
Helm3 stores chart release info as a secret, so rancher chart from cluster1 is stored as secret in cattle-system namespace, which gets backed up and created on the new cluster due to restore. So now no need to reinstall rancher, we just need to upgrade it
(If needed, also follow steps to install cert-manager from rancher HA install docs)
helm upgrade rancher rancher-alpha/rancher --version 2.5.0-alpha1 --namespace cattle-system --set hostname= --set rancherImageTag=master-head --set webhook.enabled=false

should work with above steps, moving to test as no actual change is needed in the operator or the chart

sowmyav27 · 2020-09-16T00:31:07Z

Verified on master-head - commit id: ad697207

Deploy rancher HA setup.
Deploy a couple of user clusters.
deploy backup restore chart/app in the local cluster.
Take a backup b1 which is saved
Delete the local cluster nodes for this HA setup
Deploy a new RKE cluster (3 nodes all roles). Add this node to the target groups/load balancer.
Install the backup-restore-operator chart on the new cluster using Helm CLI

helm repo add rancherchart https://charts.rancher.io
helm repo update
helm install backup-restore-operator-crd rancherchart/backup-restore-operator-crd -n cattle-resources-system --create-namespace
helm install backup-restore-operator rancherchart/backup-restore-operator -n cattle-resources-system

Restore from backup using a restore CR and prune must be set to false. Like this
Install certs - https://rancher.com/docs/rancher/v2.x/en/installation/k8s-install/helm-rancher/#5-install-cert-manager
Bring up rancher by helm upgrade rancher rancher-alpha/rancher --version 2.5.0-alpha1 --namespace cattle-system --set hostname=<same hostname as first rancher server> --set rancherImageTag=master-head
When rancher comes up, the user clusters come up fine.
Deploy a backup CR.
Backup gets saved successfully in S3.

sowmyav27 assigned sowmyav27 and mrajashree Sep 4, 2020

sowmyav27 added the kind/bug-qa label Sep 4, 2020

sowmyav27 added this to the v2.5 milestone Sep 4, 2020

mrajashree changed the title ~~User is not able to take a backup after a restore~~ Cluster migration: User is not able to take a backup after a restore Sep 4, 2020

maggieliu added the [zube]: Next Up label Sep 4, 2020

mrajashree added the [zube]: Working label Sep 6, 2020

jiaqiluo removed the [zube]: Next Up label Sep 6, 2020

sangeethah added the status/blocker label Sep 11, 2020

mrajashree mentioned this issue Sep 13, 2020

Expand the default rancher-resource-set rancher/charts#643

Merged

mrajashree added the [zube]: To Test label Sep 15, 2020

cloudnautique removed the [zube]: Working label Sep 15, 2020

mrajashree mentioned this issue Sep 15, 2020

Cluster migration: Issues when restored to a new cluster with Rancher installed #6

Closed

sowmyav27 closed this as completed Sep 16, 2020

rancher-max added [zube]: Done and removed [zube]: To Test labels Sep 16, 2020

zube bot removed the [zube]: Done label Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster migration: User is not able to take a backup after a restore #24

Cluster migration: User is not able to take a backup after a restore #24

sowmyav27 commented Sep 4, 2020

mrajashree commented Sep 6, 2020 •

edited

mrajashree commented Sep 13, 2020 •

edited

mrajashree commented Sep 14, 2020 •

edited

mrajashree commented Sep 14, 2020 •

edited

sowmyav27 commented Sep 16, 2020

Cluster migration: User is not able to take a backup after a restore #24

Cluster migration: User is not able to take a backup after a restore #24

Comments

sowmyav27 commented Sep 4, 2020

mrajashree commented Sep 6, 2020 • edited

mrajashree commented Sep 13, 2020 • edited

mrajashree commented Sep 14, 2020 • edited

mrajashree commented Sep 14, 2020 • edited

sowmyav27 commented Sep 16, 2020

mrajashree commented Sep 6, 2020 •

edited

mrajashree commented Sep 13, 2020 •

edited

mrajashree commented Sep 14, 2020 •

edited

mrajashree commented Sep 14, 2020 •

edited