New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] How to recover from a catastrophic failure? #2714
Comments
You could refer to the doc on how to recover the data from the replica. |
Thanks for the reply @jenting . I just tested the documentation approach. I am able to read the content of the replicas, but I still cannot see the volume in the longhorn UI. Basically it allows me to access the data, but I cannot reuse as part of the longhorn system. Is there a way I can add the volume to the longhorn system and be able to see it in the UI (with the replicas)? In a nutshell, I would like to be able to re-use the longhorn system without having to migrate all my data to new volumes (It would take an awful amount of time) |
Currently, I am looking at this: https://github.com/longhorn/longhorn-engine#running-a-controller-with-multiple-replicas I think that any solution would include those steps. What I am trying to understand now ir
|
While following the above documentation I am able to start the longhorn container in docker and the block device gets created, but I can not mount it, since it says: mount: /mnt/pv: /dev/longhorn/pv-d853ac78 already mounted or mount point busy I have checked and it does not seem to be a multitool issue. |
Ok. So I started going through the code to see how longhorn "detects" a volume. In a nutshell, my understanding is:
My investigation is leaning towards 3 possibilities: 1 - Start a clean Longhorn installation (following the documentation) and call some manager API to recreate each volumes, reusing the replicas instead of creating new replicas. Not sure if those APIs exist Number 1 and 2 are what I am investigating now. Since this is my first time going through this code, I would appreciate if anyone familiar with the code can provide me some guidance/help to speed my investigation up. PS.:
|
I think we should add the documentation on how to recover the local replica data, to perform migration/movement to a newly created Longhorn volume. The workflow like this:
@alkmim Thank you for working on this enhancement. I think the above steps should be work w/o running from longhorn-ui -> longhorn-manager -> longhorn-engine. But if you like to get more familiar with Longhorn, probably you could go with Longhorn architecture doc first. |
Possibly we could implement this requirement along with #2461 because when performing the disk replica scanning, it's able to know how to recover the volume from the replica or clean the orphan replica. But it decided by the admin on which replicas to recover and which replicas to clean up. |
Hello @jenting. Thanks for the response. I like your idea on #2714 (comment), but I am still concerned about the fact that on every full reboot of the cluster the entire data would have to be transferred to new volumes, since this can lead to a huge downtime for big datasets and also the overhead on I/O (network and disk) Since the replicas are already there, healthy and the longhorn-engine can start the controller referencing existent replicas (https://github.com/longhorn/longhorn-engine#running-a-controller-with-multiple-replicas) wouldn't the be better for the disk replica scanning to creating volumes re-using the existing replicas (if they are health of course) instead of creating new volumes and sync the content? PS.:
|
Not sure what's the reason for to full reboot of the cluster, even the Kubernetes control plane node? Besides that, thanks for your willingness to contribute to the Longhorn code. You could refer to these also
But if you still have an idea/problem with the Longhorn code, we could discuss it on Rancher slack or let us arrange a call to clarify/discuss the ideas (but we need to know your timezone first 😉). |
I can see a couple of scenarios for a full reboot (including by design). In my case, I have a requirement that everything in the cluster should be rebuilt from scratch, using only the data disks. One reason that would impact most people would be a power outage. I am not sure how the ETCD keeps its information. If the information is persisted across reboots in the filesystem, I can make sure it is always in sync in the data disks. It could be a good direction. I will read the mentioned documents, but I would also be happy to discuss it on Rancher slack or having a call. Can you invite me to the workspace? It will be good to avoid spending effort in the wrong direction since I do not know longhorn long term plans and I am also very new to the code. Again, no need to worry about my timezone. I can talk on any timezone. |
Another viable option for my use case, would be to follow this workflow:
If this solution works, I would still need to know for every volume/replica what are the Custom Resources that I need to create to be sure the Manager will use it. |
Have you considered taking backup to an external S3 provider before rebooting the cluster? So even you rebuilt the cluster from scratch, you could restore the data from an external backup. Or if all nodes including control plane nodes and worker nodes, you could choose to use external etcd to make the longhorn Custom Resouces (CRs) be kept after the node reboot or rebuilt from scratch. This could make sure the Kubernetes cluster data persistent. After that, even the longhorn components (longhorn-manager, longhorn-instance-engine, longhorn-instance-replicas, ...) gone. After the cluster back, use the external etcd, the longhorn still could access the original data with existed Custom Resources (CRs) + replicas on the nodes. |
I have the data available to be restored. The problem with this approach is that it takes a long time to restore even for small amounts of data.
Having an external etcd is not an option, since the cluster is entirely self-contained. The reboot will cause the etcd to be rebooted. In summary, the full reboot will happen. I will still all the data and I can have specific folders in the nodes to be persisted across reboots (etcd folders maybe?). I cannot workaround the reboot, everything will be rebooted. The scenario is the same as a power loss in the entire system. |
Right now, I am investigating 2 approaches:
@jenting Please, let know how we can set up a call. I will be glad to provide more information if required. |
I just went through all the resources created when I created a new volume. It seems that these would be the resources:
I will test dumping all of these resources into yaml and than recreate the volumes by running kubectl apply -f. I will update here if it works. |
Probably you could consider use Rancher cluster restore or Velero to backup and restore the etcd data and longhorn CRs.
|
Hello @jenting. thanks for your reply. Rancher cluster restore and Velero are a good start. Although I will have to figure out a way to restore only the longhorn related info from the etcd backup. Feature #1455 Sounds promising. Meanwhile, I continued with the tests I mentioned above. Below are my findings
Yaml used: longhorn_volume.yaml Some notes:
@jenting Do you see anything that could cause the LH Replicas and LH Engines to be delete as soon as I create them? |
cc @shuo-wu |
|
I think I found a good procedure. Longhorn was able to recognize the disks correctly. One of the tricks I was missing was to properly set the metadata.labels in the replicas and engine. Here is the overall approach:
Note:
Thanks @shuo-wu and @jenting for your help! Please, let me know your thoughts on this approach. I also hope this helps someone else facing the same requirements. |
Stopping longhorn manager pods and modifying UUID is a little bit tricky but I think it will work. As I mentioned above, you may need to handle the restored volume ( |
Thanks for the feedback @shuo-wu . In my case, I am not using restored volumes (at least I think I am not). I took the empty of value I have one more concern:
Thanks for all the feedbacks! |
After replication done, all 4 replicas should be in healthy state. Currently, Longhorn do automatically scale down the replicas. User can manually delete one of healthy replica. In order to add to volume, you can first increase replica count for the volume and after successful replication and decrease the replica count to original value. |
If there is no restored volume, you can ignore the field. You can delete any replicas you don't want to retain except for the last healthy one. Longhorn will detect the out-of-sync replicas each time when the volume is being attached. |
Thanks for your reply @cclhsu and @shuo-wu I did some reboots (for testing) and it seems that the recreation following the process describe in my previous post. Feel free to close this issue. If I find any issue related with this procedure I update this issue. Although, I think it would be a good feature for Longhorn to be able to detect volumes based on the replicas located in the nodes during boot time. |
I also think it would be a good feature, even if it would be triggered manually by calling an api or pressing a button in the frontend. We had a huge longhorn/cluster outage a few days ago and I spend hours nights and days to recover all volumes (more than 250). The backups worked but 1 backup was defect and 1 one was missing. I don't know why but since I had the data on the hard disk of the old nodes I want to recover from there. At this time I'm reading this thread here and figuring out how to get the data from the old servers hard disk from filesystem and get it imported again into the new cluster with longhorn installed. |
I am not sure if it is helpful, but the procedure I tested seems to be working so far (see here for reference: #2714 (comment)) I know it is a manual procedure but it seems to be working just fine. I actually did an automated python code for my environment thought. Not very hard. |
Question
How should I recover longhorn system if all my nodes die and the only backup I have is the /var/lib/longhorn from all my nodes?
Right now, whenever I simulate this kind of failure and re-install the longhorn system, I can clearly see the replicas in the nodes, but the longhorn does not see them.
Environment:
Additional context
Since this is a simulation, I can backup any other folder/config right before the failure.
The text was updated successfully, but these errors were encountered: