Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-manila: added initial support for snapshots #652

Merged
merged 6 commits into from Jul 16, 2019

Conversation

@gman0
Copy link
Contributor

commented Jun 21, 2019

What this PR does / why we need it:
This PR adds initial support for snapshots and restoring volumes from snapshots. Docs and examples updated as well + some minor refactoring, clean up

Things that are still missing and will be added in the upcoming PRs:

  • support for snapshot restoration of CephFS shares: Manila doesn't support "create share from snapshot" for CephFS just yet (see this support matrix), meaning in this case the driver will have to do the storage-specific heavy lifting on its own
  • too generic error messages from backend (not necessarily linked to snapshots only): if something goes wrong (e.g. storage quota exceeded when creating a new share), the relevant resource will go into an error state. The driver currently only reports that the operation has failed due to the resource being in the aforementioned error state, but not much more - not exactly helpful to the end user. There is an API to fetch the actual error messages from Manila, but we're lacking its client-side support in the gophercloud framwork. This needs to be added to gophercloud in order to improve error reporting.
  • CI jobs: we're lacking CI jobs (#630), the plan is to test the whole life-cycle of a share, that is create->snapshot->restore->mount->delete

Release note:

CSI Manila: added initial support for snapshots
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2019

Hi @gman0. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build succeeded.

@gman0

This comment has been minimized.

Copy link
Contributor Author

commented Jun 21, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

Build failed.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

return nil, fmt.Errorf("failed to probe for snapshot: %v", err)
}
} else {
klog.V(4).Infof("snapshot %s (snapshot ID %s) already exists", snapID, snapshot.ID)

This comment has been minimized.

Copy link
@tsmetana

tsmetana Jun 21, 2019

Member

I think we should also check whether the sourceShareID is the same. Calling CreateSnapshot with the existing SnapshotID but different source is also an error.

This comment has been minimized.

Copy link
@gman0

gman0 Jun 21, 2019

Author Contributor

Good point, thanks!

This comment has been minimized.

Copy link
@adisky

adisky Jun 24, 2019

Contributor

+1 if sourceShareID is different, need to return codes.AlreadyExists error

This comment has been minimized.

Copy link
@gman0

gman0 Jul 5, 2019

Author Contributor

ID validation added in a8bdd23


func newSnapshotID(snapshotName string) snapshotID {
return snapshotID(fmt.Sprintf("csi-manila-%s", snapshotName))
}

This comment has been minimized.

Copy link
@tsmetana

tsmetana Jun 21, 2019

Member

Shouldn't the IDs be more unique? Won't this end with too frequent ID collisions?

This comment has been minimized.

Copy link
@gman0

gman0 Jun 21, 2019

Author Contributor

I don't think so - at least from the perspective of a single CO because the resource names it generates are implied to be unique for that particular instance of CO (otherwise it couldn't guarantee idempotency). We could use the actual resource ID from Manila as a volume/snapshot ID though - that would make it easier to find the resource on the backend...so this is definitely something to reconsider but it probably won't help with uniqueness.

If what you mean is the situation where 2 COs were to generate the same name, then yes, only one of them would create the resource and the other one would just fetch the existing storage asset - and this is indeed a collision because in this situation the intention was to have two distinct resources. So if we have a single Manila instance and multiple COs (each running an instance of csi-manila) claiming assets on that Manila instance, each instance of the driver could receive some unique string during startup (e.g. via cmd args / env vars)...then we could have a hashing function where
hash(input-unique-string, resource-name) = resource-id

What do you think?

This comment has been minimized.

Copy link
@tsmetana

tsmetana Jun 21, 2019

Member

Snapshots are namespaced (just like PVCs): user in namespace A wants snapshot snap, user in namespace B wants also snapshot snap (I tend to do this myself -- reusing yamls for different things in one cluster) and this code would assign both the snapshots the same ID. And think about special cases where user creates a snapshot, deletes it and creates another one with the same name... This must not break either.

This comment has been minimized.

Copy link
@gman0

gman0 Jun 22, 2019

Author Contributor

Sorry, I'm not following...sure, PVCs and snapshots can have matching names (in their object metadata) in separate namespaces but those are not exposed to CSI. CSI Create{Volume/Snapshot} receives UIDs of those objects (stored in CreateVolumeRequest.Name and CreateSnapshotRequest.Name respectively) and these must be unique for sure.

I tried to reproduce the scenario you've described:
I've created 2 PVCs named new-nfs-share-pvc in namespaces default and kube-public

I0622 11:38:09.066939       1 driver.go:265] [ID:14] GRPC call: /csi.v1.Controller/CreateVolume
I0622 11:38:09.067183       1 driver.go:266] [ID:14] GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-30f456e8-94e2-11e9-809c-52540013b951","parameters":{"type":"default"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":5}}]}
I0622 11:38:10.512865       1 volumesource.go:37] creating a share for volume csi-manila-pvc-30f456e8-94e2-11e9-809c-52540013b951

I0622 11:40:53.646209       1 driver.go:265] [ID:19] GRPC call: /csi.v1.Controller/CreateVolume
I0622 11:40:53.646422       1 driver.go:266] [ID:19] GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-91097c3e-94e2-11e9-809c-52540013b951","parameters":{"type":"default"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":5}}]}
I0622 11:40:53.755975       1 volumesource.go:37] creating a share for volume csi-manila-pvc-91097c3e-94e2-11e9-809c-52540013b951

Those are 2 CreateVolume requests for 2 PVCs with the same name but different namespace.

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                           STORAGECLASS     REASON   AGE
pvc-30f456e8-94e2-11e9-809c-52540013b951   1Gi        RWX            Delete           Bound    default/new-nfs-share-pvc       csi-manila-nfs            57m
pvc-91097c3e-94e2-11e9-809c-52540013b951   1Gi        RWX            Delete           Bound    kube-public/new-nfs-share-pvc   csi-manila-nfs            55m

$ kubectl describe pv pvc-30f456e8-94e2-11e9-809c-52540013b951
Name:            pvc-30f456e8-94e2-11e9-809c-52540013b951
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: manila.csi.openstack.org
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    csi-manila-nfs
Status:          Bound
Claim:           default/new-nfs-share-pvc
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            manila.csi.openstack.org
    VolumeHandle:      csi-manila-pvc-30f456e8-94e2-11e9-809c-52540013b951
    ReadOnly:          false
    VolumeAttributes:      cephfs-mounter=fuse
                           shareAccessID=95a4ca19-ae10-4457-b5e0-c57e955249de
                           shareID=5e123251-fe73-44aa-8124-4e624b401c1d
                           storage.kubernetes.io/csiProvisionerIdentity=1561201976886-8081-manila.csi.openstack.org

$ kubectl describe pv pvc-91097c3e-94e2-11e9-809c-52540013b951
Name:            pvc-91097c3e-94e2-11e9-809c-52540013b951
Labels:          <none>
Annotations:     pv.kubernetes.io/provisioned-by: manila.csi.openstack.org
Finalizers:      [kubernetes.io/pv-protection]
StorageClass:    csi-manila-nfs
Status:          Bound
Claim:           kube-public/new-nfs-share-pvc
Reclaim Policy:  Delete
Access Modes:    RWX
VolumeMode:      Filesystem
Capacity:        1Gi
Node Affinity:   <none>
Message:         
Source:
    Type:              CSI (a Container Storage Interface (CSI) volume source)
    Driver:            manila.csi.openstack.org
    VolumeHandle:      csi-manila-pvc-91097c3e-94e2-11e9-809c-52540013b951
    ReadOnly:          false
    VolumeAttributes:      cephfs-mounter=fuse
                           shareAccessID=5cda7e52-350a-4fff-980d-8fc8790dcea1
                           shareID=ebf07997-83bf-436f-b305-45d96d32ed71
                           storage.kubernetes.io/csiProvisionerIdentity=1561201976886-8081-manila.csi.openstack.org

First snapshot of default/new-nfs-share-pvc:

I0622 11:42:02.044179       1 driver.go:265] [ID:24] GRPC call: /csi.v1.Controller/CreateSnapshot
I0622 11:42:02.044200       1 driver.go:266] [ID:24] GRPC request: {"name":"snapshot-bcd02708-94e2-11e9-809c-52540013b951","secrets":"***stripped***","source_volume_id":"csi-manila-pvc-30f456e8-94e2-11e9-809c-52540013b951"}
I0622 11:42:02.206078       1 controllerserver.go:252] creating snapshot csi-manila-snapshot-bcd02708-94e2-11e9-809c-52540013b951 of volume csi-manila-pvc-30f456e8-94e2-11e9-809c-52540013b951

$ kubectl describe volumesnapshot.snapshot.storage.k8s.io/new-nfs-share-snap                                                                                                                                     
Name:         new-nfs-share-snap
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1alpha1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2019-06-22T11:41:56Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-protection
  Generation:        5
  Resource Version:  1849397
  Self Link:         /apis/snapshot.storage.k8s.io/v1alpha1/namespaces/default/volumesnapshots/new-nfs-share-snap
  UID:               bcd02708-94e2-11e9-809c-52540013b951

snapshot of kube-system/new-nfs-share-pvc:

I0622 11:55:17.826043       1 driver.go:265] [ID:38] GRPC call: /csi.v1.Controller/CreateSnapshot
I0622 11:55:17.826171       1 driver.go:266] [ID:38] GRPC request: {"name":"snapshot-0eee019e-94e4-11e9-809c-52540013b951","secrets":"***stripped***","source_volume_id":"csi-manila-pvc-91097c3e-94e2-11e9-809c-52540013b951"}
I0622 11:55:17.992888       1 controllerserver.go:252] creating snapshot csi-manila-snapshot-0eee019e-94e4-11e9-809c-52540013b951 of volume csi-manila-pvc-91097c3e-94e2-11e9-809c-52540013b951

$ kubectl -n kube-public describe volumesnapshot.snapshot.storage.k8s.io/new-nfs-share-snap
Name:         new-nfs-share-snap
Namespace:    kube-public
Labels:       <none>
Annotations:  <none>
API Version:  snapshot.storage.k8s.io/v1alpha1
Kind:         VolumeSnapshot
Metadata:
  Creation Timestamp:  2019-06-22T11:51:23Z
  Finalizers:
    snapshot.storage.kubernetes.io/volumesnapshot-protection
  Generation:        5
  Resource Version:  1850153
  Self Link:         /apis/snapshot.storage.k8s.io/v1alpha1/namespaces/kube-public/volumesnapshots/new-nfs-share-snap
  UID:               0eee019e-94e4-11e9-809c-52540013b951
Spec:
  Snapshot Class Name:    csi-manila-snapclass
  Snapshot Content Name:  snapcontent-0eee019e-94e4-11e9-809c-52540013b951
  Source:
    API Group:  <nil>
    Kind:       PersistentVolumeClaim
    Name:       new-nfs-share-pvc

None of those resources' IDs match, despite their matching object metadata names in k8s.

And think about special cases where user creates a snapshot, deletes it and creates another one with the same name... This must not break either.

I0622 13:07:24.104486       1 driver.go:265] [ID:17] GRPC call: /csi.v1.Controller/CreateSnapshot
I0622 13:07:24.104630       1 driver.go:266] [ID:17] GRPC request: {"name":"snapshot-acfa5443-94ee-11e9-a2e2-52540013b951","secrets":"***stripped***","source_volume_id":"csi-manila-pvc-9798c34e-94ee-11e9-a2e2-52540013b951"}
I0622 13:07:24.338836       1 controllerserver.go:252] creating snapshot csi-manila-snapshot-acfa5443-94ee-11e9-a2e2-52540013b951 of volume csi-manila-pvc-9798c34e-94ee-11e9-a2e2-52540013b951

I0622 13:09:04.165481       1 driver.go:265] [ID:24] GRPC call: /csi.v1.Controller/CreateSnapshot
I0622 13:09:04.165500       1 driver.go:266] [ID:24] GRPC request: {"name":"snapshot-e8c30c85-94ee-11e9-a2e2-52540013b951","secrets":"***stripped***","source_volume_id":"csi-manila-pvc-9798c34e-94ee-11e9-a2e2-52540013b951"}
I0622 13:09:04.321829       1 controllerserver.go:252] creating snapshot csi-manila-snapshot-e8c30c85-94ee-11e9-a2e2-52540013b951 of volume csi-manila-pvc-9798c34e-94ee-11e9-a2e2-52540013b951

It can't break, their UIDs are different.

The CSI spec says that volume/snapshot names are there as a suggested ID (CreateVolume, DeleteVolume, ListVolume RPC interactions) and to maintain idempotency between requests with the same name in a CSI request. If those names were not unique, I don't think they would be able to maintain idempotency.

What do you think? Or can you elaborate more regarding where we might be in a risk of ID collision?

This comment has been minimized.

Copy link
@adisky

adisky Jun 24, 2019

Contributor

I have not read the whole conversation, but what is the benefit of generating id's here?? why we cannot use the id from openstack as it is??

This comment has been minimized.

Copy link
@tsmetana

tsmetana Jun 24, 2019

Member

My bad: I didn't realize where do these names come from. They should be unique enough for nothing I was afraid of to happen... I got triggered by the fact we call ID something that relies on external source for uniqueness.
@adisky Yes, this would be also an option, @gman0 mentiones this in the comment above too. I think the reason to use our own ID is simply because it's easier and good enough for the moment.

This comment has been minimized.

Copy link
@gman0

gman0 Jul 5, 2019

Author Contributor

Using OpenStack resource IDs as volume/snapshot IDs: a8bdd23

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 21, 2019

@jichenjc

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

/ok-to-test

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

/test cloud-provider-openstack-acceptance-test-e2e-conformance

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

@gman0 Thanks for the PR!!
Few questions inline and also It is suggested to add sanity testing(in a separate PR )
https://github.com/kubernetes-csi/csi-test/tree/master/pkg/sanity, it is easier to track validation and return error codes.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jun 24, 2019

Build failed.

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2019

@gman0 This PR needs a rebase

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

@theopenlab-ci

This comment has been minimized.

Copy link

commented Jul 5, 2019

Build succeeded.

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jul 8, 2019

@gman0 Is it in final shape from your side??

@gman0

This comment has been minimized.

Copy link
Contributor Author

commented Jul 8, 2019

@adisky oops, forgot to post a comment...yes, this PR is done and waiting for LGTMs

PTAL @tsmetana @tombarron @gouthampacha

@dims dims removed their request for review Jul 8, 2019

@tsmetana
Copy link
Member

left a comment

It's really big so I hope I have not overlooked something... However I have not seen anything that should block the merge.

@ramineni
Copy link
Contributor

left a comment

Thanks. Some comments inline. But I'm ok if they are addressed in different PR considering inconsistencies with gate recently.
And also, may be you could consider minimal gate job for it as well.

@@ -40,6 +40,22 @@ spec:
volumeMounts:
- name: socket-dir
mountPath: /var/lib/kubelet/plugins/csi-nodeplugin-manilaplugin
- name: csi-snapshotter
image: quay.io/k8scsi/csi-snapshotter:v1.1.0

This comment has been minimized.

Copy link
@ramineni

ramineni Jul 11, 2019

Contributor

we have new version of snapshotter , 1.2.0 , https://kubernetes-csi.github.io/docs/external-snapshotter.html , may be you could update it.

@@ -67,7 +67,7 @@ A single instance of the driver may serve only a single Manila share protocol. T

### Kubernetes 1.13+

Required feature gates: `CSIDriverRegistry`, `CSINodeInfo`
Required feature gates: `CSIDriverRegistry=true`, `CSINodeInfo=true`. Snapshots require `VolumeSnapshotDataSource=true` feature gate.

This comment has been minimized.

Copy link
@ramineni

ramineni Jul 11, 2019

Contributor

`CSIDriverRegistry and CSINodeInfo are beta features from 1.14 , so enabled by default, no explicit enabling is required , only required if using k8s version less than 1.13. May be you could add Note, may be in different PR .

image: quay.io/k8scsi/csi-snapshotter:v1.1.0
args:
- "--csi-address=$(ADDRESS)"
- "--connection-timeout=15s"

This comment has been minimized.

Copy link
@ramineni

ramineni Jul 11, 2019

Contributor

connection-timeout is deprecated in 1.1.0 , need to be removed

verbs: ["get", "list", "watch"]
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["create"]
- apiGroups: ["csi.storage.k8s.io"]

This comment has been minimized.

Copy link
@ramineni

ramineni Jul 11, 2019

Contributor

This needs to be updated as well, its moved to storage.k8s.io group

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

considering above comments will be addressed in a separate PR
/lgtm

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jul 11, 2019

@gman0 Please before continuing with features add minimal gate jobs

@gman0

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

@ramineni thanks for your feedback! @adisky will do, thanks!

@gman0

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

I'll do a more thorough manifest refactoring in a separate PR

@adisky

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adisky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9fa54fa into kubernetes:master Jul 16, 2019

16 checks passed

cla/linuxfoundation gman0 authorized
Details
openlab/cloud-provider-openstack-acceptance-test-csi-cinder cloud-provider-openstack-acceptance-test-csi-cinder status: success
Details
openlab/cloud-provider-openstack-acceptance-test-e2e-conformance cloud-provider-openstack-acceptance-test-e2e-conformance status: success
Details
openlab/cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.13 cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.13 status: success
Details
openlab/cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.14 cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.14 status: success
Details
openlab/cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.15 cloud-provider-openstack-acceptance-test-e2e-conformance-stable-branch-v1.15 status: success
Details
openlab/cloud-provider-openstack-acceptance-test-flexvolume-cinder cloud-provider-openstack-acceptance-test-flexvolume-cinder status: success
Details
openlab/cloud-provider-openstack-acceptance-test-k8s-cinder cloud-provider-openstack-acceptance-test-k8s-cinder status: success
Details
openlab/cloud-provider-openstack-acceptance-test-keystone-authentication-authorization cloud-provider-openstack-acceptance-test-keystone-authentication-authorization status: success
Details
openlab/cloud-provider-openstack-acceptance-test-lb-octavia cloud-provider-openstack-acceptance-test-lb-octavia status: success
Details
openlab/cloud-provider-openstack-acceptance-test-standalone-cinder cloud-provider-openstack-acceptance-test-standalone-cinder status: success
Details
openlab/cloud-provider-openstack-format cloud-provider-openstack-format status: success
Details
openlab/cloud-provider-openstack-unittest cloud-provider-openstack-unittest status: success
Details
pull-cloud-provider-openstack-check Job succeeded.
Details
pull-cloud-provider-openstack-test Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.