Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of volume attach limits and delayed volume binding #65201

Closed
msau42 opened this issue Jun 19, 2018 · 15 comments
Closed

Integration of volume attach limits and delayed volume binding #65201

msau42 opened this issue Jun 19, 2018 · 15 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@msau42
Copy link
Member

msau42 commented Jun 19, 2018

Is this a BUG REPORT or FEATURE REQUEST?:
@kubernetes/sig-storage-feature-requests

What happened:
This is a followup to the 1.11 design discussions regarding volume attach limits and how it will work with delayed volume binding. Here's a rough sketch of how we can integrate the two, based off of previous discussions. I'm also trying to consider how this could be extended to work for arbitrary resources.

  1. Add a new field to StorageClass to indicate how many resources a provisioned volume could consume.
kind: StorageClass
...
nodeResources:
- name: "attachable-volumes-gce-pd"
  quantity: 1
- name: "memory"
  quantity: 100M
  • name is the name of the resource that is consumed
  • quantity is how much it consumes
  1. Currently, volume max counts predicate handles resources prefixed with "attachable-volumes", and PodFitsResources predicate handles the rest. Both would have to be extended to also account for resources consumed by unbound and bound PVs via StorageClass.nodeResources.

  2. I'm also trying to see if there's a way we can handle reporting dynamic provisioning capacity for local volumes through a similar or same mechanism. It would need a special resource prefix again for special handling by the volume binding predicate. And if we want to support more than just local volumes, then we would need a way to specify a topology object kind (ie a Rack or Zone CRD, or maybe some first class TopologyResource object) where the allocatable resources are specified. If so, then this may no longer be just nodeResources. I need to see how much overlap there may be with device manager here.

Would like to hear your thoughts on this idea. cc @gnufied @liggitt

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/feature Categorizes issue or PR as related to a new feature. labels Jun 19, 2018
@lichuqiang
Copy link
Contributor

lichuqiang commented Jun 19, 2018

It's a little different from it in my mind.

  1. You mean the quantity is the resource a volume consumed, but not total capacity? then how should the predicate handler make use of it, can you give more details?
  2. Has the issue something to do with memory resource and PodFitsResources predicate? Or it's just a case of "arbitrary resources"?
  3. For local volume capacity, I think we need node-specific values, but not a universal value.

@ddebroy
Copy link
Member

ddebroy commented Jun 19, 2018

Looks good. One question regarding:

- name: "attachable-volumes-gce-pd"
  quantity: 1

Will the quantity setting for attachable-volumes-* ever be something other than 1? Or is there a scenario where a driver provisions a single PV that maps to (or spans over) multiple EBS/PD/AzDisk?

@gnufied
Copy link
Member

gnufied commented Jun 19, 2018

What is the use case of supporting arbitrary resources within PV or storageClasses? Before we go there - I think we should gather some real world data and how those resources will be applied and how scheduler has to be modified to take that into account. What happens when individual PVs can share the memory? (like a shared fuse mount). Are memory limits applicable to PVs very much storage implementation specific? what happens to those resource requirements when Storage provider version or implementation changes? Such as AWS EFS vs off-the-shelf NFS? Glusterfs vs ceph-glusterfs or glusterfs v1.x vs glusterfs v2.x

Can the volume attach limit and memory limit of PV be in conflict with each other? Who wins in that case?

@msau42
Copy link
Member Author

msau42 commented Jun 19, 2018

You mean the quantity is the resource a volume consumed, but not total capacity?

Sorry I wasn't clear. The total capacity is reported in the Node resources, and the consumption of a to-be-provisioned-PV is reported in the StorageClass. Something like capacity would need to be special cased with a prefix since the capacity is inferred by PVC request.

is there a scenario where a driver provisions a single PV that maps to (or spans over) multiple EBS/PD/AzDisk?

@liggitt had a hypothetical example of a driver raiding two ebs volumes together.

What is the use case of supporting arbitrary resources within PV or storageClasses?

Volume drivers like NFS or Ceph have kernel components that can consume system memory and network bandwidth that is unaccounted for. That being said, I agree, we probably have many other volume scaling issues that are more pressing than this. The most important resources we're trying to deal with currently are attachable limits and capacity. Limits for other system resources could potentially be accounted for through attach limits in the near term, but I wanted to see if we could come up with a more general API that could be extended in the future, if needed.

@msau42
Copy link
Member Author

msau42 commented Jun 20, 2018

One more use case to further complicate all of this:

In addition to attach limits per node, GCE PD also has maximum capacity per node of 64TB on most instances. To handle this, I think we need to count the capacity of PVCs on the node (and inline volumes won't work).

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2018
@ddebroy
Copy link
Member

ddebroy commented Sep 18, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2018
@msau42
Copy link
Member Author

msau42 commented Sep 25, 2018

I updated the design doc on how to integrate the max pd predicate with volume topology: kubernetes/community#2711

I left out supporting a volume taking up multiple attach points and a volume consuming other node resources like memory/cpu. We don't have any concrete use cases or needs for those yet, and we can reconsider if something comes up. The design for integration is much simpler in this case.

@ddebroy
Copy link
Member

ddebroy commented Sep 25, 2018

/assign

I will pick this up as we discussed. Thanks for updating the design doc!

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 24, 2018
@ddebroy
Copy link
Member

ddebroy commented Dec 26, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 26, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 25, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

6 participants