Restore multiple (master) servers from etcd snapshot #3174

StarpTech · 2021-04-11T09:57:39Z

Is your feature request related to a problem? Please describe.
Yes, the current documentation only describes how to restore from a single master server setup.

Describe the solution you'd like
It should be possible to restore a snapshot and distribute it to all other servers as described in (rke) https://rancher.com/docs/rke/latest/en/etcd-snapshots/#how-restoring-from-a-snapshot-works

Describe alternatives you've considered
Documentation and automation of how to do it safely with the current implementation. My instructions were as follows:

Stop the master server.

sudo systemctl stop K3s

Restore the master server from a snapshot

./k3s server \
  --cluster-reset \
  --cluster-reset-restore-path=<PATH-TO-SNAPSHOT>

Connect you with the different servers and run:

sudo systemctl stop K3s
rm -rf /var/lib/rancher/k3s/data
sudo systemctl start K3s

Cluster is healthy

Additional informations

Cluster was installed with https://github.com/StarpTech/k-andy

brandond · 2021-04-11T18:04:44Z

We don't have a central coordination tool like RKE, and no plans to create one. After restoring the snapshot to the first server, you should remove the database files on the other servers and rejoin them to the cluster.

StarpTech · 2021-04-11T18:40:50Z

Hi @brandond so the workaround is correct? What's the strategy in the long term to handle restore scenarios in large clusters?

brandond · 2021-04-12T06:46:51Z

Long term, automation of this sort will likely be handled by Rancher cluster operator orchestration.

StarpTech · 2021-04-12T07:35:44Z

Could we document the restore procedure of the current implementation with multiple master nodes? I'm not sure if this is the exact right approach.

brandond · 2021-04-12T07:48:20Z

Follow the restore instructions from the docs. When the restore is complete you will see a message on the console:

k3s/pkg/etcd/etcd.go

Line 187 in 503d681

    
           logrus.Infof("Etcd is running, restart without --cluster-reset flag now. Backup and delete ${datadir}/server/db on each peer etcd server and rejoin the nodes")

Follow those instructions - stop k3s on the other servers (if it is still running), delete the referenced file, then start k3s again to rejoin the cluster.

StarpTech · 2021-04-12T08:12:14Z

Thanks, I didn't recognize the last line.

StarpTech changed the title ~~How to restore multiple (master) servers from etcd snapshot?~~ Restore multiple (master) servers from etcd snapshot Apr 11, 2021

StarpTech mentioned this issue Apr 11, 2021

How to make a complete backup of the k3s cluster data? #3114

Closed

StarpTech closed this as completed Apr 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore multiple (master) servers from etcd snapshot #3174

Restore multiple (master) servers from etcd snapshot #3174

StarpTech commented Apr 11, 2021 •

edited

brandond commented Apr 11, 2021

StarpTech commented Apr 11, 2021 •

edited

brandond commented Apr 12, 2021 •

edited

StarpTech commented Apr 12, 2021 •

edited

brandond commented Apr 12, 2021 •

edited

StarpTech commented Apr 12, 2021

Restore multiple (master) servers from etcd snapshot #3174

Restore multiple (master) servers from etcd snapshot #3174

Comments

StarpTech commented Apr 11, 2021 • edited

brandond commented Apr 11, 2021

StarpTech commented Apr 11, 2021 • edited

brandond commented Apr 12, 2021 • edited

StarpTech commented Apr 12, 2021 • edited

brandond commented Apr 12, 2021 • edited

StarpTech commented Apr 12, 2021

StarpTech commented Apr 11, 2021 •

edited

StarpTech commented Apr 11, 2021 •

edited

brandond commented Apr 12, 2021 •

edited

StarpTech commented Apr 12, 2021 •

edited

brandond commented Apr 12, 2021 •

edited