Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Application consistent snapshot/backup, Volume Group #2128

Open
yasker opened this issue Dec 21, 2020 · 14 comments
Open

[FEATURE] Application consistent snapshot/backup, Volume Group #2128

yasker opened this issue Dec 21, 2020 · 14 comments
Assignees
Labels
area/csi CSI related like control/node driver, sidecars area/kubernetes Kubernetes related like K8s version compatibility area/volume-backup-restore Volume backup restore component/longhorn-manager Longhorn manager (control plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Milestone

Comments

@yasker
Copy link
Member

yasker commented Dec 21, 2020

Is your feature request related to a problem? Please describe.
Currently, Longhorn can do crash-consistent snapshot/backup. But for the applications like databases, quiescing is needed to create an application-consistent snapshot, so the application will make sure all the data in the memory has been written to the disk before creating the snapshot.

Describe the solution you'd like
Provide a way for the users to quiesce the workload before taking the snapshot/backup.

Describe alternatives you've considered
Users can also script the backup using CSI snapshotter and run quiescing/unquescing before taking the snapshot. But it won't work with the recurring snapshot/backup which is scheduled by Longhorn.

Additional context
One application is normally composed of multiple workloads, so we need to consider volume group scenario as well.

@yasker yasker added kind/feature Feature request, new feature component/longhorn-manager Longhorn manager (control plane) priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal require/API-design area/volume-backup-restore Volume backup restore labels Dec 21, 2020
@yasker yasker added this to the v1.2.0 milestone Dec 21, 2020
@innobead
Copy link
Member

@jenting Please help with this. Thanks.

@joshimoo
Copy link
Contributor

linking the backup refactor issue #1761

@janeczku
Copy link
Contributor

janeczku commented Feb 4, 2021

Application state may be distributed across multiple persistent volumes (e.g. sharded DB, distributed indexes). The application consistent backup must then ensure that snapshots are performed in a point-in-time consistent manner across all volumes.

Offering such feature Longhorn may allow users to create "volume groups" and define on-demand or scheduled backup plans for the whole group instead of individual volumes.

@innobead
Copy link
Member

innobead commented Feb 8, 2021

Application state may be distributed across multiple persistent volumes (e.g. sharded DB, distributed indexes). The application consistent backup must then ensure that snapshots are performed in a point-in-time consistent manner across all volumes.

Offering such feature Longhorn may allow users to create "volume groups" and define on-demand or scheduled backup plans for the whole group instead of individual volumes.

Thanks for the comment. The similar idea we had some discussion before. The goal is we need to introduce a mechanism to make the volumes of application atomic aware operations supported and like u said, not just individual volumes.

Please follow up the discussion and proposals in near future.

@innobead innobead modified the milestones: v1.2.0, v1.3.0 Apr 29, 2021
@innobead innobead changed the title [FEATURE]Application consistent snapshot/backup [FEATURE] Application consistent snapshot/backup Sep 21, 2021
@innobead innobead added the highlight Important feature/issue to highlight label Oct 21, 2021
@innobead
Copy link
Member

Hey team! Please add your planning poker estimate with ZenHub @jenting @PhanLe1010 @shuo-wu @joshimoo

@joshimoo
Copy link
Contributor

There is a lot to consider here and high uncertainty, volume groups, pre/post hooks (executing inside of the workload pods), error handling etc. Now that we have backup crds we can also consider exposing (syncing) these crds as csi snapshots via creation of the appropriate csi snapshot and snapshotData resources.

@innobead innobead assigned c3y1huang and unassigned derekbit Dec 23, 2021
@innobead innobead added the reprioritization-needed Need to reconsider to re-prioritize in another milestone instead of the current one label Jan 10, 2022
@shuo-wu
Copy link
Contributor

shuo-wu commented Feb 16, 2022

Leave a note here:
Velero relies on annotation hooks to execute the cmd before/after backup. I can do something similar to archive application-consistent snapshot/backup. (run sync and freezes before the snapshot/backup)

The next:
Investigate Kanister

@innobead innobead modified the milestones: v1.3.0, v1.4.0 Mar 18, 2022
@innobead innobead changed the title [FEATURE] Application consistent snapshot/backup [FEATURE] Application consistent snapshot/backup, Volume Group May 20, 2022
@R-Studio
Copy link

R-Studio commented Jun 9, 2022

This feature would be awesome! 👍🏽😃
@innobead can be estimated when this feature will be implemented? (I ask because it is a open feature request since 21 Dec 2020)

@innobead
Copy link
Member

ref: kubernetes/enhancements#3476

@innobead innobead added area/kubernetes Kubernetes related like K8s version compatibility area/csi CSI related like control/node driver, sidecars labels Jan 14, 2023
@innobead
Copy link
Member

innobead commented Feb 7, 2023

ref: kubernetes/enhancements#3476

container-storage-interface/spec#519
https://github.com/container-storage-interface/spec/releases/tag/v1.8.0 (introduce VolumeGroupSnapshot related RPCs under new GroupController services as alpha)

@innobead innobead removed the reprioritization-needed Need to reconsider to re-prioritize in another milestone instead of the current one label Feb 8, 2023
@innobead innobead modified the milestones: v1.5.0, v1.6.0 Feb 8, 2023
@innobead
Copy link
Member

Some interesting project can be referenced as well, https://github.com/kanisterio/kanister.

@innobead innobead assigned c3y1huang and unassigned PhanLe1010 Aug 16, 2023
@c3y1huang
Copy link
Contributor

c3y1huang commented Aug 24, 2023

container-storage-interface/spec#519 https://github.com/container-storage-interface/spec/releases/tag/v1.8.0 (introduce VolumeGroupSnapshot related RPCs under new GroupController services as alpha)

Note:

  • The Kubernetes VolumeGroupSnapshot supports PVC only and is triggered by the PVC label group=<group_name>. This means a PVC can only be associated with a single group, this is probably not ideal for RWX PVCs where it might be used by multiple workloads.
  • The most recent release of kubernetes-csi/external-snapshotter, version v6.2.2, doesn't have the implementation of the volume group snapshot APIs. This feature currently sits in the master branch.

@innobead
Copy link
Member

innobead commented Aug 24, 2023

container-storage-interface/spec#519 https://github.com/container-storage-interface/spec/releases/tag/v1.8.0 (introduce VolumeGroupSnapshot related RPCs under new GroupController services as alpha)

Note:

  • The Kubernetes VolumeGroupSnapshot supports PVC only and is triggered by the PVC label group=<group_name>. This means a PVC can only be associated with a single group, this is probably not ideal for RWX PVCs where it might be used by multiple workloads.

This new feature seems similar to the current VolumeSnapshot which means users also can do RWX volume backup by multiple workloads now, so it's not related to this new interface. This is a general interface, so basically we don't need to worry about whether it's ideal or not for different volume access modes.

I see, then we have two phases here. In the first phase (1.6), let's introduce the internal mechanism like AppBackup and AppRestore or VolumeGroupBackup and VolumeGroupRestore if we want to make it more general. In the second phase (after 1.6) when VolumeGroupSnapshot is ready upstream, we can do the same backup & restore but the trigger point is from the CSI path instead of creating AppBackup or restore by users or recurring jobs.

To sum up.

  • The creator/driver of AppBackup will be a user, recurring job, or CSI. (DR should be naturally the same)
  • The creator of AppRestore will be a user or CSI.

@c3y1huang
Copy link
Contributor

Backlog this first to work on higher-priority items:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi CSI related like control/node driver, sidecars area/kubernetes Kubernetes related like K8s version compatibility area/volume-backup-restore Volume backup restore component/longhorn-manager Longhorn manager (control plane) highlight Important feature/issue to highlight kind/feature Feature request, new feature priority/0 Must be fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation require/lep Require adding/updating enhancement proposal
Projects
None yet
Development

No branches or pull requests

10 participants