Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI support proposal #1661

Open
wants to merge 14 commits into
base: master
from

Conversation

@nrb
Copy link
Member

commented Jul 12, 2019

Remaining:

  • disabling CSI snapshotting, similar to disabling Velero-native snapshots
  • skipping CSI snapshotting in the plugins if the volume is covered by restic
  • mention feature flags/gates
  • use ownerRefs
  • deleting the underlying snapshot when the backup expires (using a Retain deletion policy means it will not be cleaned up by default)

Signed-off-by: Nolan Brubaker brubakern@vmware.com

design/csi-snapshots.md Outdated Show resolved Hide resolved

@nrb nrb added the Design label Jul 16, 2019

@nrb nrb requested review from carlisia and prydonius Jul 17, 2019

design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
It was [introduced with dynamic provisioning support][15] in 2016, predating CSI.

In the `BackupItemAction` for PVCs, the associated PV will be queried and checked for the presence of `PersistentVolume.Spec.PersistentVolumeSource.CSI`.
Volumes with any other `PersistentVolumeSource` set will use Velero's current VolumeSnapshotter plugin code path.

This comment has been minimized.

Copy link
@prydonius

prydonius Jul 24, 2019

Collaborator

Is there a case where you'd want to use Velero's current VolumeSnapshotter instead of CSI even if CSI was used to provision the volume? For example, if the CSI driver supported provisioning but not snapshotting?

This comment has been minimized.

Copy link
@nrb

nrb Jul 25, 2019

Author Member

Hm, that's a good possibility. I know there's gRPC methods for finding out if the driver supports snapshots, but I'd have to check to see if we can query the driver from k8s.

This comment has been minimized.

Copy link
@skriss

skriss Aug 12, 2019

Contributor

Our in-tree VolumeSnapshotters won't work as-is for CSI volumes, even if they're on the right platform, due to how the GetVolumeID/SetVolumeID functions work (and maybe other things too, not sure). We could look at changing that if we did want to be able to support using them to snapshot CSI volumes, but right now it won't work.

This comment has been minimized.

Copy link
@nrb

nrb Aug 14, 2019

Author Member

Update: currently there is no way to query what a driver supports from k8s. So we can't really know if CSI snapshotting is supported via a query to something like the storageclass at the moment.

I would say to simplify things, if CSI volumes are in use but the driver doesn't have snapshotting, restic would be the best option.

@timh
Copy link

left a comment

My direct comments are all just typos/re-words.
Generally, though, this looks good. In the list of "changes needed in plugins", though, it'd be good to see a "To support aspect X of feature Y, we need to change Z". You're just including "Z" without the context. IMO, this design doc reads pretty densely to someone not deeply familiar with Velero's implementation.

design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
design/csi-snapshots.md Outdated Show resolved Hide resolved
@carlisia
Copy link
Member

left a comment

I read a bit about CSI and read this closely a few times. No input at the moment, lgtm.

@nrb

This comment has been minimized.

Copy link
Member Author

commented Aug 8, 2019

Per feedback, we need a way of telling Velero not to create CSI snapshots. Maybe use the SnapshotVolumes field on the backup?

@skriss

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

I know you brought up feature flags recently - I do think we should consider adding them, and adding one for the CSI work, as we're developing here. I imagine we're going to ship this in parts (e.g. first add support for creating CSI snapshots, then add support for restoring them, ...) and it would be nice to allow users to opt-in until they're ready.

@skriss

This comment has been minimized.

Copy link
Contributor

commented Aug 12, 2019

One other thing to think about - similar to existing code, we'll want to skip creating snapshots if restic is being used to back up a PVC - I'm not exactly sure how we'll do that given that the CSI code is in plugins. Need to think about it.

@nrb

This comment has been minimized.

Copy link
Member Author

commented Aug 12, 2019

I created #1759 for the feature flag work. It likely needs a bit more detail, but at least we have a place to discuss it now.

@nrb nrb changed the title Initial CSI proposal CSI support proposal Aug 13, 2019

nrb added 13 commits Jul 12, 2019
Initial CSI proposal
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Address some questions, reorganize some info
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Add existing GH issues to the document
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Addressing some review feedback
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Typo fixes
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Reformat plugins for clearer separation
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Context for why each plugin is necessary, ownerRefs
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Add more GC/deletion considerations
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Respect the SnapshotVolumes boolean on the backup
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Feature flag mentions and restic brainstorming
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Clarifications
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Use AppliesTo function to select relevant objects
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
Add restic-backed PVC selection logic
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>

@nrb nrb force-pushed the nrb:csi-snapshot-proposal branch from 33cf33e to 7d3ab7b Aug 19, 2019

@nrb

This comment has been minimized.

Copy link
Member Author

commented Aug 19, 2019

@skriss @prydonius @carlisia I believe all the items in the checklist are now addressed. I've also done a bit of reorganization of the sections. Please take another look!

@carlisia
Copy link
Member

left a comment

lgtm 👍


To ensure this is the case, server code must be extended so that PVCs are marked in such a way that the CSI BackupItemAction plugin is not invoked for PVCs known to be backed up by restic.

The restic backupper's `BackupPodVolumes` method will be extended to get PersistentVolumeClaims where relevant and add a `velero.io/restic-backed: true` label.

This comment has been minimized.

Copy link
@skriss

skriss Aug 19, 2019

Contributor

I see a couple of challenges with this approach - (1) BackupPodVolumes will actually run after the PVC has been backed up, since the PVC is backed up as an additional item for the pod prior to the restic backups running; and (2) the label value would need to have the name or UID of the backup, so we know if the PVC is being backed up with restic for this backup, not just any previous backup.

One alternative approach for consideration - you could have the csi-pvc backup item action list all pods in the namespace that the PVC is in, find the pod that is mounting the PVC, and check if it's annotated for restic backup. If we do something like that, it means we don't need to add a new label/annotation which I think is desirable.

This comment has been minimized.

Copy link
@skriss

skriss Aug 19, 2019

Contributor

Longer-term, I wonder if we should pass a richer "context" object to our plugins that includes information like this (i.e. the resticSnapshotTracker). Right now one of the drawbacks of implementing logic in plugins is that you don't get access to some of those internal data structures used by pkg/backup or pkg/restore to track what's happening. If we could expose those to plugins, things would be a lot easier.

This comment has been minimized.

Copy link
@nrb

nrb Aug 19, 2019

Author Member

Yeah, I really wanted to use the resticSnapshotTracker, but opted for the label since that could be used with the AppliesTo function, thereby not even invoking the Execute method for csi-pvc when a PVC should be skipped. Was hoping to avoid too many extra API server calls to determine a no-op, but it may be unavoidable for now.

I'm not sure I want to completely redo the plugin interface in this proposal, as I'd like for CSI support to be a 1.x feature. However, I think providing some sort of context object would be very useful for plugins in the future.

Thanks for the call out on the PVCs being backed up during the Pod's process; I had forgotten that the PVCs were backed up immediately instead of being added to the PVC group's loop.

Your alternative sounds do-able, I think.

Update restic treatment
Signed-off-by: Nolan Brubaker <brubakern@vmware.com>
@nrb

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2019

I'm going to hold off on updating this pending decisions made in kubernetes/kubernetes#81792 and kubernetes/enhancements#989

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.