Volume attach limit doesn't work for `WaitForFirstConsumer` Volume binding mode #73724

sarjeet2013 · 2019-02-04T23:28:27Z

What happened:

Volume binding mode WaitForFirstConsumer doesn't consider the volume attach limit per node.

What you expected to happen:

WaitForFirstConsumer volume binding mode should have respected the attach limit, similar to Immediate binding mode.

How to reproduce it (as minimally and precisely as possible):

In CSI driver, configured the volume-attach limit to 1 per node on a 3 node cluster.

In Immediate volume binding mode, only 3 pods with 3 PV/PVCs runs.
In WaitForFirstConsumer binding mode, all 5 pods runs with 5PV/PVCs on 3 node cluster. (Actually 4 pods scheduled on 1 node and 1 on other node, 3rd didn't have any pod scheduled)

Anything else we need to know?:

I was able to reproduced it with minimal node and configuration, and also hit this for stress case with 100 Volume limit and >500 pods running on the 5 mode cluster.

Environment:

Kubernetes version (use kubectl version):

➜ ~ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:38Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider or hardware configuration: On-prem linux server
OS (e.g. from /etc/os-release): Centos 7
Kernel (e.g. uname -a): 3.10.0
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

sarjeet2013 · 2019-02-04T23:30:47Z

cc @msau42

sarjeet2013 · 2019-02-04T23:31:01Z

/sig storage

msau42 · 2019-02-04T23:44:24Z

/assign @gnufied

Currently the predicate skips counting PVCs that are not bound. It should lookup the storageclass to get the provisioner/driver name.

This solution will require a StorageClass object is even if you don't use dynamic provisioning.

msau42 · 2019-02-05T05:28:01Z

I had a thought that it wouldn't be so simple to count unbound PVCs of other pods since binding is still in progress, but I think it's fine actually. We only count other pods that have been assigned to the same node. Those pods are assigned to the node AND still unbound because of one of these reasons:

It's actually been scheduled to the node and someone manually came in and unbound the PVC: unlikely and an unsupported sequence since we implemented StorageProtection
The Pod was initially created with nodeName already set, bypassing the scheduler
The Pod is going to be scheduled (it's been assumed), and it's currently going through binding. This is normal with delayed binding.

Either way, the fact that the Pod's nodename is set to this node means that the pod should be counting towards the node's limit.

gnufied · 2019-02-08T22:36:45Z

Hmm, so after opening the PR that fixes this, I realize that - from storageclass we don't really know if the volume we are evaluating is CSI volume or not. I think a similar problem exists for existing predicate though. Like AzureDisk predicate does not know if PVC it is over counting is of type AzureVolume or not.

So, I suspect that those 4 hardcoded predicates are counting all delayed binding volumes.

msau42 · 2019-02-08T22:45:38Z

if the storageclass.provisioner name is not found, can we just ignore it?

gnufied · 2019-02-08T23:27:04Z

I think we can ignore it. The PR currently does that.

sarjeet2013 added the kind/bug Categorizes issue or PR as related to a bug. label Feb 4, 2019

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 4, 2019

k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 4, 2019

k8s-ci-robot assigned gnufied Feb 4, 2019

gnufied mentioned this issue Feb 8, 2019

Fix code to handle delayed binding volumes #73863

Merged

k8s-ci-robot closed this as completed in #73863 Apr 1, 2019

matthias50 mentioned this issue Aug 19, 2019

Too many EBS volumes mounted to a node #81533

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Volume attach limit doesn't work for `WaitForFirstConsumer` Volume binding mode #73724

Volume attach limit doesn't work for `WaitForFirstConsumer` Volume binding mode #73724

sarjeet2013 commented Feb 4, 2019

sarjeet2013 commented Feb 4, 2019

sarjeet2013 commented Feb 4, 2019

msau42 commented Feb 4, 2019

msau42 commented Feb 5, 2019

gnufied commented Feb 8, 2019

msau42 commented Feb 8, 2019

gnufied commented Feb 8, 2019

Volume attach limit doesn't work for WaitForFirstConsumer Volume binding mode #73724

Volume attach limit doesn't work for WaitForFirstConsumer Volume binding mode #73724

Comments

sarjeet2013 commented Feb 4, 2019

sarjeet2013 commented Feb 4, 2019

sarjeet2013 commented Feb 4, 2019

msau42 commented Feb 4, 2019

msau42 commented Feb 5, 2019

gnufied commented Feb 8, 2019

msau42 commented Feb 8, 2019

gnufied commented Feb 8, 2019

Volume attach limit doesn't work for `WaitForFirstConsumer` Volume binding mode #73724

Volume attach limit doesn't work for `WaitForFirstConsumer` Volume binding mode #73724