Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume snapshots with LVM #285

Closed
micw opened this issue Mar 11, 2022 · 11 comments
Closed

Volume snapshots with LVM #285

micw opened this issue Mar 11, 2022 · 11 comments
Labels

Comments

@micw
Copy link

micw commented Mar 11, 2022

It looks like despite the high demand, hetzner is not going to implement volume snapshots to cloud volumes (see #88 for discussion). So here's a different approach.

The CSI driver could implement snapshots based on LVM. This would require the following changes:

  • use LVM on newly created volumes
  • allow to configure reserved size for volume snapshots
  • the requested volume size on hetzner cloud must be volume size + reserved snapshot size
  • resize code must also resize lvm volume
  • resize code must change the cloud volume size when reserved snapshot size is changed
  • code to create/delete snapshots must be added

What do you think about this?

@samcday
Copy link
Contributor

samcday commented Mar 29, 2022

Without needing to implement any support in this CSI driver, perhaps this could already be achieved with the openebs/lvm-localpv driver?

That is, you would attach some hcloud volumes to a Node, then bind those into an LVM setup with openebs/lvm-localpv. At that point you can do thin provisioning on top of the hcloud volume(s), LVM snapshot/restore, and so on.

(This is all pure speculation, I only just toyed around with the lvm-localpv recently, and have not tried using it specifically with hcloud volumes)

@micw
Copy link
Author

micw commented Mar 29, 2022

While it actually might be possible, one would loose all the features that the Hetzner-CSI brings (Automatic volume provisioning, deprovisioning, volume resizing, auto-attach to the correct node). So I fear that's not a solution.

@samcday
Copy link
Contributor

samcday commented Mar 29, 2022

I think you could probably still take advantage of this csi-driver. For example you could schedule a StatefulSet with persistentVolumeClaims and appropriate node affinity to have the volumes provisioned + attached + resized.

@micw
Copy link
Author

micw commented Mar 29, 2022

I'm sure that it's possible to build something that somehow works. But my intention with this ticket is to have reliable, working and supported snapshot support that can be used in production. Best would be if hetzner would provide it as a feature of their volumes - but it does not seem to happen. An alternative to me would be to have it in the hetzner CSI provider. The next alternative for me is to use a differnt hoster for stuff that needs snapshots. For me comes long before using any non-standard workaround.

@github-actions
Copy link

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

@jampy
Copy link

jampy commented Jul 29, 2024

@micw I'm also interested in the same feature and I think it would be greatly appreciated by many users. Sadly this issue has been stale for a while. What did you end with doing or did you find a different, elegant solution?

@micw
Copy link
Author

micw commented Jul 30, 2024

No, I have no solution. The issue was auto-staled by a stupid bot.

@apricote
Copy link
Member

apricote commented Aug 8, 2024

We do not plan on supporting LVM Snapshots in the csi-driver. I have added +2 to our internal feedback tracker for Volume Snapshots.

@bilbof
Copy link

bilbof commented Oct 11, 2024

FYI, Velero is a decent workaround for this. You can backup PVCs with Velero using the FileSystemBackup.

From an end-user perspective, it's a question of adding this to pods (rather than to PVCs as with csi-provided snapshots):

metadata:
 annotations:
   backup.velero.io/backup-volumes: my-volume # name of a pod volume
 labels:
   backup.velero.io/backups-enabled: "true" # optional, but useful for making PVC backups opt-in

From an administrator perspective, you have to:

  1. Install Velero (their Helm chart is decent)
    1. Enable the node agent daemonset
    2. Configure Velero to connect to S3
  2. Add labels/annotations to pods to enable volume backups
  3. Run backups manually with velero backup or on a schedule with their CRD
  4. Restore manually with velero restore

Under the hood what happens:

  1. Velero runs a controller pod and a daemonset (node agent) in-cluster
  2. Node agent is responsible for reading the node filesystem (bound PVCs) and shipping backups to S3
  3. Restores are managed by controller, which injects an init container that syncs the bound volume with the backup in S3

Notes:

  • Velero is not as good as built-in PVC snapshots - it's more complex and less reliable than Hertzner implementing it at the hw level
  • Hertzner's S3 is in beta; I got access and was able to connect Velero to their S3. You need to use the AWS S3 plugin but copy the Minio config (see below)
# volumeSnapshotLocation config for velero with Hertzner S3:
config:
     region: fsn1
     s3Url: https://fsn1.your-objectstorage.com
     checksumAlgorithm: ""
# Note: You need more config changes to the chart values than this, but this is a crucial bit

@jampy
Copy link

jampy commented Oct 13, 2024

@bilbof IMHO this issue is about having block level point-in-time snapshots of Hetzner volumes (what Hetzner currently does not offer natively).

It is an essential feature that backup tools like Velero can use to have consistent backups. Without snapshots not even Velero can deal ideally with file system changes during the backup.

CSI drivers like the one of Hetzner unfortunately make it impossible to create consistent backups of live filesystems. An extreme use case are databases.

The OP mentions a possible solution by adding a a LVM layer on top of the Hetzner volume as it is able to provide snapshots on top of any block device. Since it adds a significant level of complexity, the OP asked if it could be integrated into Hetzner's CSI driver. So in one way or the other a big showstopper would be resolved by Hetzner itself.

That said, the absolutely best solution would still be that Hetzner provides a native snapshotting mechanism but the company did not signal any intentions to implement it.

@bilbof
Copy link

bilbof commented Oct 13, 2024

@jampy I agree, this was just to provide a workaround for a fast in-cluster restore.

For DBs I think the best solution are db-specific tools, e.g. pgBackRest for postgres, which operators like crunchy datas bake in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants