-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Tenancy for Persistent Volumes #47326
Comments
@krmayankk There are no sig labels on this issue. Please add a sig label by: |
/sig storage |
some answers from the mailing list:-
@davidopp Could you elaborate on the inefficient utilization when you mention the following ?"If your set of tenants is very static, i guess you could have one StorageClass per tenant and only use "recycle" reclaim policy (which seems to be what you're advocating). But this seems pretty inefficient from a utilization standpoint, as you'd end up accumulating the max number of PVs used by each tenant." |
@mikedanese yes looks like encryption seems like a reasonable solution as long as we can separate the encryption key per tenant and only the tenant has access to its own encryption keys. EBS volumes are the only ones supporting encryption. So for RBD , we are out of luck. Is there a general pattern of doing encryption with external provisioning ? What about incorporating tenancy in in-tree plugins ? |
@smarterclayton Doesn't OpenShift have use cases for keeping the volumes around after the pods are gone ? Does OpenShift do any kind of multi tenancy on top ? |
@thockin thanks. Not advocating that PV should or should not be namespaced but just trying to understand the rationale for it. One thinking was if PV's were dynamically provisioned in the namespace of the customer, that might further limit their access. |
@jeffvance your understanding of my question is correct. Could you link the issues which talk about ACL'ing PV's ? Overall i want a multi tenant model, where:- In the absence of encryption if there is a Kubernetes native way of ACL'ing the PV's that doesnt rely on the underlying storage implementation would be great. |
@krmayankk I don't think volume ACLs are a formal issue yet. I've cc'd @childsb and @erinboyd since I think they may have some preliminary notes on this topic. |
If we assume each customer gets a StorageClass you can use ResourceQuotas (per-namespace) to prevent certain namespaces from requesting storage i.e. set .storageclass.storage.k8s.io/persistentvolumeclaims to 0 for everybody but customer x. You can also set the claimRef namespace field in PVs to restrict each to with only PVCs from a certain namespace. (Not very elegant solutions, IMO/admittedly :) ) |
Regarding your question about AttachDisk, it does occur on the master node by default, and not kubelet. |
@msau42 Specifically for EBS and RBD, where does AttachDisk happen on the master or on the node ? I see code in both kubelet and the controller. How do we know where this is happening ? Is this configurable, for each volume type ? |
By default, the attach/detach controller runs in the master. It will invoke the plugin's attach routine through the operation executor. There is a kubelet option to enable attach/detach controller in kubelet, but it is off by default. It is not on a per-volume basis. |
Any volume plugin that is backed by a block device can probably support LUKS. @kubernetes/sig-storage-feature-requests have we ever discussed LUKS encryption layers for block device volumes? |
|
Interesting @rootfs @msau42 so in RBD, why do we need the user secret in user namespace or it could be my wrong understanding. Putting it other way, in RBD we need two secrets admin and user. My understanding is that user secrets must be in the same namespace as the PVC and they are used for AttachDisk. If AttachDisk is happening on master, we should allow the user secret to be in any namespace or accept a namespace field for it. |
rbd doesn't support 3rd party attach, so rbd map has to happen on kubelet. rbd admin and user keyrings are for different purposes: admin keyring for rbd image provisioning (admin privilege), while user keyring for rbd map (non admin privilege). Pods that use rbd image don't have admin keyrings. |
The attach operation is only performed by the attach/detach controller. And that controller is only enabled to run in the master node by default. There is a kubelet option to turn on attach/detach on the node, but it's going to be removed. |
That's already implemented. If a PV gets bound to a PVC the PV can never get bound to another PVC. Only the pods in the same namespace as the PVC can use it. When the PVC is deleted, the PV gets |
I think one key point from what @krmayankk describes is the identity of a PV (e.g. PV created by a tenant, PV of an internal customer). After binding, PV is attached the identity of the PVC, but the binding process itself doesn't seem to take that into consideration. From what I know, it looks into selector, storageclass, accessmode, etc. As of now, it seems the best way is to mimic identity information using selector and storageclass. @krmayankk why do you want to |
@jsafrane @ddysher in case of dynamic provisioning, the reclaim policy is always delete. But enterprises might want to still keep the PV and not delete them immediately for safety reasons. So we cant make use of the Released phased for preventing the binding. The identity of the PV is important because, in no circumstance we want the binding process to accidentally bind a PV of one customer to a PVC of different customer. There is nothing that prevents it. Agreed that until a PVC is bind to PV, it cannot bind to another PV, but if accidentally we have some unbound PV's (due to bug or whatever) from customer A, we would want them to not get bound to customer B's PVC. The only way to prevent that today i can think of is assign per customer storage classes. |
@krmayankk once the PV is deleted, then there is no PV object that exists anymore, so you can't accidentally bind to it. You would have to recreate the PV, either statically or through dynamic provisioning. Is what you really want the ability to specify Retain policy for dynamically provisioned PVs, so that you have the chance to clean up the data before it gets put back into the provisioning pool? |
@msau42 Since dynamic provisioning doesnt support reclaim policy and always deletes. What we are doing is when deleting statefulsets, we are not deleting the PVC's. That way we explicitly GC the PVC at a later time and hence the PV. While the PVC and hence the PV is in GC phase, i am worried that the PVC could get unbound due to bugs and hence the PV will become available for further binding by other tenants. Do you think this is possible ? If somehow the PVC/PV get unbound for a dynamically provisioned PV, will the phase of PV be Release or Available ? |
If a PV is bound to a PVC, then there are two pieces of information about the two-way binding:
My suspect is that In between, if another PVC is also pending, then it's possible that the other PVC will bind to the PV, and the original PVC will just stay as is (and pv_controller keeps rebinding and keeps failing). If my understanding is correct, I think the likelihood that PV is bound by other tenants is pretty low, if not impossible.
|
Assuming no recycle policy, once the PV is unbound, it always goes to Released or Retain, or the PV object is deleted entirely. The same PV object cannot go to Available state. So the same PV object is never recycled. It's always a new PV object. Now, whether or not the data on the underlying backing volume gets cleaned up before being put back into the storage pool (used for dynamic provisioning) by the storage provider is a different story, and it's going to depend on each provider. For example, for GCE PD, when the disk gets deleted, it is guaranteed that the content gets cleaned up in the underlying volume before it could be reused for a new disk. For local storage, the provided external provisioner will cleanup the data when the PV is released. For other volume plugins, that may not be the case, and I believe that is why @krmayankk wants the Retain policy to be able to manually cleanup the data on the volume. I still think if we had the ability to set reclaim policy to Retain on dynamically provisioned volumes, then it should be able to address your concern about being able to clean up volumes before being used again by other tenants. |
See also issue 38192 Allow configuration of reclaim policy in StorageClass |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
@krmayankk sorry to respond to this necro post, but did you come to any conclusion on this? I assume you can create a LimitRange which is really small for namespaces which should be restricted from using certain storageclasses, correct? Have CSI plugins in k8s 1.10 and the new storage improvements in 1.11 sorted out any issues? https://kubernetes.io/docs/tasks/administer-cluster/limit-storage-consumption/ Restricting storage access with Ceph is quite easy, not sure about NetApp using Trident or other mechanisms though right now. Thanks! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
/reopen |
@krmayankk: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/lifecycle frozen |
What keywords did you search in Kubernetes issues before filing this one? tenancy
This issue intends to start a discussion on how to do Multi Tenancy in Kubernetes for Persistent Volumes in particular.
My team is currently trying to enable Stateful Apps for our internal customers. One requirement that keeps coming up is how to isolate PV's of one internal customer from PV's of another internal customer.
I see the following isolation mechanisms:-
While the above isolation is good, its not enough(as i understand it). In a multi -tenant environment we want mechanisms which can guarantee that a volume allocated to one customer can never be accidentally allocated/mounted/accessed by another customer.
When using Kubernetes, what should we recommend to our customers ?
Few more questions, considerations:-
Some form of RBAC might be good, but currently it works cluster wide or per namespace. Since Volumes are considered cluster wide, we can only do cluster wide RBAC, but the volumes belong to different tenants, so we want something else(we need to define that something else)
Some related discussion on multi tenancy here.
#40403
The text was updated successfully, but these errors were encountered: