New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of volume attach limits and delayed volume binding #65201
Comments
It's a little different from it in my mind.
|
Looks good. One question regarding:
Will the |
What is the use case of supporting arbitrary resources within PV or storageClasses? Before we go there - I think we should gather some real world data and how those resources will be applied and how scheduler has to be modified to take that into account. What happens when individual PVs can share the memory? (like a shared fuse mount). Are memory limits applicable to PVs very much storage implementation specific? what happens to those resource requirements when Storage provider version or implementation changes? Such as AWS EFS vs off-the-shelf NFS? Glusterfs vs ceph-glusterfs or glusterfs v1.x vs glusterfs v2.x Can the volume attach limit and memory limit of PV be in conflict with each other? Who wins in that case? |
Sorry I wasn't clear. The total capacity is reported in the Node resources, and the consumption of a to-be-provisioned-PV is reported in the StorageClass. Something like capacity would need to be special cased with a prefix since the capacity is inferred by PVC request.
@liggitt had a hypothetical example of a driver raiding two ebs volumes together.
Volume drivers like NFS or Ceph have kernel components that can consume system memory and network bandwidth that is unaccounted for. That being said, I agree, we probably have many other volume scaling issues that are more pressing than this. The most important resources we're trying to deal with currently are attachable limits and capacity. Limits for other system resources could potentially be accounted for through attach limits in the near term, but I wanted to see if we could come up with a more general API that could be extended in the future, if needed. |
One more use case to further complicate all of this: In addition to attach limits per node, GCE PD also has maximum capacity per node of 64TB on most instances. To handle this, I think we need to count the capacity of PVCs on the node (and inline volumes won't work). |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
I updated the design doc on how to integrate the max pd predicate with volume topology: kubernetes/community#2711 I left out supporting a volume taking up multiple attach points and a volume consuming other node resources like memory/cpu. We don't have any concrete use cases or needs for those yet, and we can reconsider if something comes up. The design for integration is much simpler in this case. |
/assign I will pick this up as we discussed. Thanks for updating the design doc! |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
@kubernetes/sig-storage-feature-requests
What happened:
This is a followup to the 1.11 design discussions regarding volume attach limits and how it will work with delayed volume binding. Here's a rough sketch of how we can integrate the two, based off of previous discussions. I'm also trying to consider how this could be extended to work for arbitrary resources.
name
is the name of the resource that is consumedquantity
is how much it consumesCurrently, volume max counts predicate handles resources prefixed with "attachable-volumes", and PodFitsResources predicate handles the rest. Both would have to be extended to also account for resources consumed by unbound and bound PVs via StorageClass.nodeResources.
I'm also trying to see if there's a way we can handle reporting dynamic provisioning capacity for local volumes through a similar or same mechanism. It would need a special resource prefix again for special handling by the volume binding predicate. And if we want to support more than just local volumes, then we would need a way to specify a topology object kind (ie a Rack or Zone CRD, or maybe some first class TopologyResource object) where the allocatable resources are specified. If so, then this may no longer be just nodeResources. I need to see how much overlap there may be with device manager here.
Would like to hear your thoughts on this idea. cc @gnufied @liggitt
The text was updated successfully, but these errors were encountered: