New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume attach limit doesn't work for WaitForFirstConsumer
Volume binding mode
#73724
Comments
cc @msau42 |
/sig storage |
/assign @gnufied Currently the predicate skips counting PVCs that are not bound. It should lookup the storageclass to get the provisioner/driver name. This solution will require a StorageClass object is even if you don't use dynamic provisioning. |
I had a thought that it wouldn't be so simple to count unbound PVCs of other pods since binding is still in progress, but I think it's fine actually. We only count other pods that have been assigned to the same node. Those pods are assigned to the node AND still unbound because of one of these reasons:
Either way, the fact that the Pod's nodename is set to this node means that the pod should be counting towards the node's limit. |
Hmm, so after opening the PR that fixes this, I realize that - from storageclass we don't really know if the volume we are evaluating is CSI volume or not. I think a similar problem exists for existing predicate though. Like AzureDisk predicate does not know if PVC it is over counting is of type AzureVolume or not. So, I suspect that those 4 hardcoded predicates are counting all delayed binding volumes. |
if the storageclass.provisioner name is not found, can we just ignore it? |
I think we can ignore it. The PR currently does that. |
What happened:
Volume binding mode
WaitForFirstConsumer
doesn't consider the volume attach limit per node.What you expected to happen:
WaitForFirstConsumer
volume binding mode should have respected the attach limit, similar toImmediate
binding mode.How to reproduce it (as minimally and precisely as possible):
In CSI driver, configured the volume-attach limit to 1 per node on a 3 node cluster.
In
Immediate
volume binding mode, only 3 pods with 3 PV/PVCs runs.In
WaitForFirstConsumer
binding mode, all 5 pods runs with 5PV/PVCs on 3 node cluster. (Actually 4 pods scheduled on 1 node and 1 on other node, 3rd didn't have any pod scheduled)Anything else we need to know?:
I was able to reproduced it with minimal node and configuration, and also hit this for stress case with 100 Volume limit and >500 pods running on the 5 mode cluster.
Environment:
kubectl version
):➜ ~ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.2", GitCommit:"17c77c7898218073f14c8d573582e8d2313dc740", GitTreeState:"clean", BuildDate:"2018-10-30T21:39:38Z", GoVersion:"go1.11.1", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider or hardware configuration: On-prem linux server
OS (e.g. from /etc/os-release): Centos 7
Kernel (e.g.
uname -a
): 3.10.0Install tools:
Others:
The text was updated successfully, but these errors were encountered: